This is the README file for replicating the analysis in Advani Koenig Pessina Summers "Immigration and the Top 1%". The code for reproducing the analysis is written as R files which can be run in R.

The data required to run the code can be obtained by approved researchers via the HMRC Datalab. Data may only be accessed within the HMRC Datalab secure research facility. To apply for access to HMRC administrative data, see https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/801989/HMRC_DatalabProjectProposalApplication.odt

The relevant datasets are:
1. "Self Assessment Valid Views" 1997-2018: this dataset contains income and demographic information for individuals who filed income tax self assessment. 
2. "PAYE" 1997-2018: this dataset contains information on earnings and pension income for individuals who had income tax withheld at source by an employer or pension provider.
Full variable lists and documentation for datasets 1-3 are available within the HMRC datalab facility. The code provided includes variable labels for variables that are used in the analysis.
3. Additionally we used data from the ONS: download links for these data are provided in the code files at the points where they are needed. These would need to be imported to the datalab by the user. This can be found and freely downloaded from the ONS website. 

Since the data are not permitted to be provided externally to the datalab facility, replication of the research can be done by requesting a new project within the HMRC datalab, requesting the code files in this archive be imported into the lab, and running the code within the secure facility.

The first file included is 0_master.R Running this will load the appropriate functions (files beginning "fn"), then set up some key parameters, then run through the data construction code sequentially (files with names beginning 1 to 7), and then run the "graph_code" files which produce the tables and figures for the paper. 
